-
Notifications
You must be signed in to change notification settings - Fork 357
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Major llama.cpp API Change #185
Major llama.cpp API Change #185
Conversation
- params contains a list of loras, instead of just one
@saddam213 I've made some changes which have broken some of the LLama.Web stuff at the moment, it looks like it's all places which construct a single context by loading an entirely new set of model weights (probably legacy code?). I can probably throw in some workarounds if you want, or do you want to do something more complete and push it into this PR? |
No probs man, I'll get it done tonight, wont take long to get it up to date with your changes :) The web project has fallen behind a bit, I need to get it up to date, got lost in the world of stable diffusion the last few weeks, lol |
@saddam213 I actually just pushed up some workarounds to at least get it building, but I left some todos in place. |
haha I know how that is. I bounce back and forth between LLMs and stable diffusion all the time 😆 |
wondering if we should merge the Web and WebApi into one project, there is a lot of code that will need to be shared to get them both up to date, and we can just use an appsetting config to decide what it strats up, API or WEB Or add a shared project, but that seems overkill for examples I cant even port the LLamaStack api or web cause I went and make custom executors like an idiot :/ |
Ok, so the changes to get the web up to date with version 0.5.1 are more than I thought. I think its too big to mash into this PR, so I will create its own PR then we can merge that in, then we can merge this PR's changes overtop That work for you? |
That's fine by me, there are workarounds in place in this PR so that it at least builds and we can get your PR in right after this one is done. I'll try to get this one finished and merged soon so we can get on with that :) |
The only thing blocking this PR is proper testing on MacOS. |
I will be at home tomorrow. This week I could check this PR |
At least with today llama.cpp binaries It fail in two test. All related with the tokenizer. This could be related with the latest changes in the tokenizer: ggerganov/llama.cpp#3538 These are the errors: Xunit.Sdk.EqualException And Xunit.Sdk.EqualException |
Thanks for testing that, I'll retrigger the github action to recompile all of the binaries and fix the tokenizer tests. Did you test with the included binaries at all to confirm that everything else works? |
I didn’t tested it yet with the included binaries, I used my daily build binaries. Tomorrow I will test the binaries generated by the github build and I will make some test with additional projects and some Semantic Kernel examples. |
Hi, that PR only applies if you explicitly pass "special=true" to Using "special=true" is meant only for "internal" use, and should not be exposed to the end user ( application user ) Enabling special token processing causes the following:
This addition to the API is meant for processing things like prompt template, and allows including bos/eos directly in the input string. User text should be tokenized normally, with "special=false". Clarification on "special" tokens: Model vocabulary naturally has a tree like structure, every normal token can be represented by two other shorter tokens, down to single character. Tokenizer picks neighboring characters from input string, finds a token that represents that substring, and repeats merging neighbors until no valid longer token can be matched. Special tokens break that rule, and are chosen to never be "reachable" by merging normal tokens, preventing the user from inserting control tokens in the prompt. Previously, special tokens were simply tokenized as plaintext ( This change, allows use of special ( control tokens ), directly in the input string, removing the need to fiddle with querying vocabulary manually to make prompt templates work. |
- Fixed stateless executor out-of-context handling - Fixed token tests
@SignalRT I've added new binaries and fixed a few small things (e.g. I think out-of-context handling should be about 100x faster now). @staviq thanks very much for the extra info! I think the problem in this case was because SignalRT was using newer binaries than this PR was built for I hadn't added that new parameter yet on the C# side! I'm not quite sure what happens in this case (the call was still successful, so I guess whatever junk was on the stack gets used). Once I added the parameter and passed |
@martindevans, all my test are working. I made some changes in the SK packages but I see that the master is right now on SK 1.0 beta so it makes no sense to change it in this branch |
Thanks for testing that! |
It's a great work! Is it a good time to publish a new release? |
And maybe I should add an auto-release to ci, I'll do it this weekend. |
I think so yeah, this has made a few breaking changes to the API but nothing massive. Good time to push out a new version 👍 |
I'm making the auto-release ci now and I think there's something confusing me. There are two supports for MAC, CPU and Metal. However only one |
@AsakusaRinne It was removed in this PR: #163 From the discussion there it looks like one binary supports CPU and Metal inference (but I don't have a Mac so I can't confirm that). |
@AsakusaRinne, one binary supports CPU and GPU. |
@martindevans @SignalRT Thank you for your help! I've opened a PR #204 of the auto-release ci. 😊 |
llama.cpp ggerganov/llama.cpp#3228 (comment) just got merged with major changes to the API. See #184 with some notes on this.
This PR has basic changes to match the underlying API changes. Things to do:
LLamaContextParams
andLLamaModelParams
have been split into 2, we should splitIModelParams
in the same wayOutOfContext
testBuilt by this run based on this commit